Skip to main content

Week 7

Milestones

Experiments with CLIP
- draw confusion matrices comparing the errors on CLIP and Tesseract classification - result: CLIP is better
Set up a git repository - shrivastava95/clip-ocr for fine-tuning CLIP onto a given dataset.
- Created a dataset of cropped word images from some pages of en-or.pdf
- Implemented base zero-shot approach
- Implemented CoOp - https://arxiv.org/abs/2109.01134 as a cheaper alternative to finetuning CLIP

Screenshots / Videos

Contributions

https://github.com/shrivastava95/clip-ocr/pull/2

Learnings

Learnt about OpenAI's CLIP model, a zero-shot model for measuring semantic similarity between image and text pairs.
This is done using cosine similarity of their projections onto a common embedding space.

Milestones
Screenshots / Videos
Contributions
Learnings